Caching at Different Layers

Learn how cache works on different layers of web applications.

Introduction#

When designing the API, we tend to focus on optimizing it with respect to performance. Let’s assume that multiple users want to access a Twitter trend and want to see the Tweets related to it. In response to these multiple network calls, the server retrieves the Tweets related to the trend from different databases and delivers them to each client. Next time, the client will experience the same delay for each request, including those made multiple times. The delay occurs due to HTTP's stateless nature, network calls, server computations, and so on.

This is illustrated in the slides below:

Created with Fabric.js 3.6.6
The client sends requests to the server to fetch Tweets related to top trends

1 of 5

Created with Fabric.js 3.6.6
The server fetches the Tweets from different databases

2 of 5

Created with Fabric.js 3.6.6
The server gets the data from the database and performs computational operations on the data

3 of 5

Created with Fabric.js 3.6.6
The server sends back the response to the client

4 of 5

Created with Fabric.js 3.6.6
But if the user made the same request again, then what are the consequences?

5 of 5

Therefore, we require a mechanism that efficiently handles the issues mentioned above and reduces the client latency. For this purpose, we can use a cache.

What is a cache?#

A cache is temporary memory that stores frequently reusable API responses. For instance, the cache stores the trends object when it’s retrieved for the first time by the client. If there isn’t any new trend and the same client refreshes or revisits the page, it’s returned from the cache instead of the server. Therefore, the cache significantly reduces the delay and recomputation on the server's side. The flow of the HTTP request-response is given below using the cache:

  1. The client sends the HTTP request to the server, which is served by the intermediate cache.

  2. If the cache does not have the requested object, the request is forwarded to the back-end server.

  3. The server sends the response to the browser and saves the copy of the requested object in the cache for subsequent requests.

A depiction of the cache during client-server communication
A depiction of the cache during client-server communication

Since the cache is essentially a copy of the data in the origin of truth (database), an outdated/stale version of data in the cache is a known issue. In order to achieve consistency between the cache and the server, we need to update the cache at regular intervals. To evict stale entries and replace outdated entries with fresh ones, various eviction strategies, such as LRU, LFU, and MRU are employed.

Point to Ponder

Question

What should be the optimal size of the cache?

Hide Answer

The cache size depends on the application’s requirements. For example:

  • Number of users using the application
  • Size and/or type of the data in the application
  • Nature of the application—for example, read heavy or write heavy
  • Cache access patterns—for example, sequentially or randomly access the content

Now we know what a cache is and why it is needed. Data consistency between caches and servers is another detailed topic, but it's beyond the scope of this course. See the Spectrum of Consistency Models lesson in the Grokking Modern System Design Interview for Engineers & Managers course for a discussion on data consistency.

Next, we’ll explore exactly where we can put the cache in our application infrastructure.

Caching at different layers#

From the description above, it may seem like caching at the server end would be adequate. However, that is not always the case. We can use caching at different layers in end-to-end communication to reduce latency. Here, we’ll mainly talk about caching at three different layers: client, middleware, and server, as shown in the illustration below.

Caching at different levels in web applications
Caching at different levels in web applications
  • Client-side layer: This layer identifies the caching types on the client-side devices.

    • Web browser cache: The browser simply checks if the required resources are locally available and returns the response. These resources are related to HTML, CSS, and other multimedia files required to build a website. Also, utilizing local data is usually faster than the other choices—for example, when the client has requested the data, and it takes several seconds to respond due to a slow Internet connection. The next time, the browser utilizes the local data without going to the network and responds within milliseconds.

  • Middleware layer: This layer identifies caches on the network between client and server networks.

    • Internet service provider (ISP): The ISP mainly maintains two cache types. The first is the Domain Name System (DNS) cache to reduce the DNS query latency. The second is the proxy server that sits in the middle of the client and origin server.

    • DNS: The main job of DNS is the resolution from the domain name to the IP address. The DNS resolver takes multiple round trips to different servers in order to get an IP address for the requested domain. The DNS caching caches the top-level domains and helps the DNS resolver to return the IP address with low latency.

    • CDNs: This is a large-sized cache that is used to serve numerous clients requesting the same data. CDNs mostly provide static objects to the closest clients.

  • Server-side layer: This layer reduces server-side burden by using in-memory caching systems.

    • API gateway cache: This cache stores responses to frequently requested API calls to avoid the recomputation of the same results. On similar subsequent requests, the response is served from the cache instead of downstream calls. It can store any data that can be transmitted over HTTP. The API gateway ensures the request is the same as the previous one by analyzing its request headers and query parameters. The API gateway does not have to worry about analyzing the request payload because we cached only GET requests for the most part.

    • Web server cache: The web server cache stores the most frequently requested static web pages. In case of dynamic data, the request is forwarded and handled by the application server.

    • Application server cache: Normally, the data is stored on the database, and fetching the data from the disk takes much more time than the RAM. This layer stores the frequently accessed data objects in different formats, and multiple custom caching solutions can be used on the application server, such as DynaCache.

    • Database cache: The database cache is used to store the responses of queries that take time to execute and are frequently called, thereby reducing the query response time.

Note: Distributed cache solutions like memcached and Redis are quite popular server-side caching solutions.

Now we know where to place the cache, but the next question is how to identify content for caching and validate it from the server. For that purpose, we have the HTTP caching headers. Let's discuss them.

HTTP caching headers#

HTTP is the core of web APIs and provides cache support. When the server sends the HTTP response to the client, it also sends the cache headers in the response. These headers indicate whether the response can be cached on any caching layer. This section will explore different HTTP cache headers to know which headers are used for what purpose.

HTTP uses headers to set caching policies for the client and intermediate/shared caching devices. When a client sends the first request to any middleware for a resource, the middleware will forward the request to the origin server in case of unavailability to fetch that resource. In return, the server responds with the resource along with caching instructions in the caching header. The illustration below indicates the process in detail.

Client
Client
CDN
CDN
Web server
Web server
Sends a request
Sends a request
Forwards a request 
Forwards a request 
HTTP response header:
cache-control: public, max-age=xxx
Last-Modified: xxx 
HTTP response header:cache-control: public, ma...
1
1
2
2
3
3
4
4
HTTP response header:
cache-control: public, max-age=xxx
Last-Modified: xxx 
HTTP response header:cache-control: public, ma...
Viewer does not support full SVG 1.1
Depiction of caching headers

The illustration above shows that the origin or the web server responds with the resource along with caching instructions in the Cache-Control header. The public and max-age headers indicate that the resource can be cached by both the caching devices and the clients for a specific time period.

Primarily, the caching headers describe policies in the following disciplines:

  • Cacheability: This describes whether the content can be cached or not. For example, certain content may be cached for a specified amount of time, while others cannot be cached at all.

  • Scope: The scope describes the possibility of caching the content at a particular caching layer. It may be possible to store some content on the client side but not on the middleware. For example, personalized content prefers to be cached on the client side, and popular public Tweets prefer to be cached on the middleware cache, such as CDNs.

  • Expiration: As the name suggests, there is a possibility of storing data for a fixed time on a caching layer. In certain cases, this expiration time may be extended.

  • Validation: Since the expiration of cached content is a norm, it’s important that caching headers allow validation with the origin server and update the content with a new one before fulfilling the incoming requests.

Mainly, the Cache-Control header is used both in client requests and in server responses via some directives. We require some other headers as well to validate the content. Let's discuss headers that support the caching policies.

Caching through HTTP Headers

Policy

Header

Values


Cacheability

Cache-Control

no-cache: The content can be cached, but validation is required from the server

no-store: The content can't be cached


Scope

Private: The content can be cached only by the client. It is good to use when we have to cache personal content

Public: The content can be cached by the client, server, and middleware

Expiration

max-age: The content can be cached for a given time in seconds

s-max-age: It is similar to max-age but is used for shared caches (shared caches are accessed by multiple clients and store only common responses for these clients)

must-revalidate: The content must be revalidated from the server and then sent to the client. In some cases (for example, if the origin server is temporarily unavailable), HTTP allows the cache to return stale responses. This directive forces the cache to either validate the response from the server or return the relevant server error.

Validation


Etag

It is an identifier for a unique version of a resource that is provided by the origin server. The server can send a newer version if the cache has an outdated copy of the content. Also, the Etag header is used with another caching header (such as If-None-Match).



If-None-Match

This header is coupled with Etag and checks whether the server has an updated version of the data. For example, the client can send If-None-Match: <Etag-of-resource> to the server. In return, the server can reply with 200 OK and the new version of content or 304 Not Modified to show that the client already has an updated version.

Last-Modified

The value of this header is in a date-time format that indicates when the content was last modified. The server sends the date and time of the last modification in the resource and the resource in response to the request. Also, this header is used with another caching header (such as If-Modified-Since).


If-Modified-Since

This header has the same purpose as that of Etag, except that its value is in the form of time. If-Modified-Since is coupled with Last-Modified, and the response is conditioned to the availability of newer versions of the resource.

Note: Etag and Last-Modified both are useful, but Etag may provide more precise information about the resource. For example, if the resource is modified frequently within a very short interval, then Last-Modified might remain the same for the frequent changes (due to unavailability of timestamp with fine-enough granularity). On the other hand, Etag always generates a unique ID for each change made to the resource's content.

  • If the max-age and public directives are given together in the Cache-Control header, then the content is cached for the number of seconds specified in the max-age directive on all the caching layers from server to client. If, however, public is replaced by private, then this content is cached only by the client for a given time.
  • If max-age and s-max-age are given together, then the value of max-age is applicable to the client and the value of max-age overrides the value of s-max-age for the shared caches on any middleware or server layer.
  • If both Etag and Last-Modified are provided by the server, then If-None-Match and If-Modified-Since are sent from the client’s end for validation. The expired content gets updated by the server.

Discussion#

As traffic grows on the Internet, the load on the servers increases day by day. Caching helps us reduce loads on servers or networks and provides low-latency responses to the clients. However, caches have some drawbacks—they are expensive, limited in size, and may contain stale data. Furthermore, caches are not generally suitable for storing dynamic content. Considering the advantages of caching, we need to implement proper cache policies to keep content updated in the cache. Also, caching at all layers might be complex, especially when we want to achieve consistency with the origin server.

Point to Ponder

Question

Should caching be performed on every layer?

Hide Answer

The answer depends on a number of things, including the content type. The server defines the policies through its caching headers. For instance, if the content is static, such as logos, CSS, or scripts of the sites, then the server sends a public directive in the Cache-Control header to indicate that the content can be cached on any layer. In contrast, if the content is sensitive, such as the user’s personal information, then the server sends a private directive so that only the client layer will cache the content.

Managing Retries

API Monitoring